6 research outputs found
Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
In many real-world settings, a team of agents must coordinate its behaviour
while acting in a decentralised fashion. At the same time, it is often possible
to train the agents in a centralised fashion where global state information is
available and communication constraints are lifted. Learning joint
action-values conditioned on extra state information is an attractive way to
exploit centralised learning, but the best strategy for then extracting
decentralised policies is unclear. Our solution is QMIX, a novel value-based
method that can train decentralised policies in a centralised end-to-end
fashion. QMIX employs a mixing network that estimates joint action-values as a
monotonic combination of per-agent values. We structurally enforce that the
joint-action value is monotonic in the per-agent values, through the use of
non-negative weights in the mixing network, which guarantees consistency
between the centralised and decentralised policies. To evaluate the performance
of QMIX, we propose the StarCraft Multi-Agent Challenge (SMAC) as a new
benchmark for deep multi-agent reinforcement learning. We evaluate QMIX on a
challenging set of SMAC scenarios and show that it significantly outperforms
existing multi-agent reinforcement learning methods.Comment: Extended version of the ICML 2018 conference paper (arXiv:1803.11485
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
In many real-world settings, a team of agents must coordinate their behaviour
while acting in a decentralised way. At the same time, it is often possible to
train the agents in a centralised fashion in a simulated or laboratory setting,
where global state information is available and communication constraints are
lifted. Learning joint action-values conditioned on extra state information is
an attractive way to exploit centralised learning, but the best strategy for
then extracting decentralised policies is unclear. Our solution is QMIX, a
novel value-based method that can train decentralised policies in a centralised
end-to-end fashion. QMIX employs a network that estimates joint action-values
as a complex non-linear combination of per-agent values that condition only on
local observations. We structurally enforce that the joint-action value is
monotonic in the per-agent values, which allows tractable maximisation of the
joint action-value in off-policy learning, and guarantees consistency between
the centralised and decentralised policies. We evaluate QMIX on a challenging
set of StarCraft II micromanagement tasks, and show that QMIX significantly
outperforms existing value-based multi-agent reinforcement learning methods.Comment: Camera-ready version, International Conference of Machine Learning
201
The StarCraft Multi-Agent Challenge
In the last few years, deep multi-agent reinforcement learning (RL) has
become a highly active area of research. A particularly challenging class of
problems in this area is partially observable, cooperative, multi-agent
learning, in which teams of agents must learn to coordinate their behaviour
while conditioning only on their private observations. This is an attractive
research area since such problems are relevant to a large number of real-world
systems and are also more amenable to evaluation than general-sum problems.
Standardised environments such as the ALE and MuJoCo have allowed single-agent
RL to move beyond toy domains, such as grid worlds. However, there is no
comparable benchmark for cooperative multi-agent RL. As a result, most papers
in this field use one-off toy problems, making it difficult to measure real
progress. In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC)
as a benchmark problem to fill this gap. SMAC is based on the popular real-time
strategy game StarCraft II and focuses on micromanagement challenges where each
unit is controlled by an independent agent that must act based on local
observations. We offer a diverse set of challenge maps and recommendations for
best practices in benchmarking and evaluations. We also open-source a deep
multi-agent RL learning framework including state-of-the-art algorithms. We
believe that SMAC can provide a standard benchmark environment for years to
come. Videos of our best agents for several SMAC scenarios are available at:
https://youtu.be/VZ7zmQ_obZ0
The StarCraft Multi-Agent Challenge
In the last few years, deep multi-agent reinforcement learning (RL)
has become a highly active area of research. A particularly challenging class of problems in this area is partially observable, cooperative,
multi-agent learning, in which teams of agents must learn to coordinate their behaviour while conditioning only on their private
observations. This is an attractive research area since such problems are relevant to a large number of real-world systems and are
also more amenable to evaluation than general-sum problems.
Standardised environments such as the ALE and MuJoCo have
allowed single-agent RL to move beyond toy domains, such as grid
worlds. However, there is no comparable benchmark for cooperative multi-agent RL. As a result, most papers in this field use one-off
toy problems, making it difficult to measure real progress. In this
paper, we propose the StarCraft Multi-Agent Challenge (SMAC)
as a benchmark problem to fill this gap.1 SMAC is based on the
popular real-time strategy game StarCraft II and focuses on micromanagement challenges where each unit is controlled by an
independent agent that must act based on local observations. We
offer a diverse set of challenge maps and recommendations for best
practices in benchmarking and evaluations. We also open-source
a deep multi-agent RL learning framework including state-of-theart algorithms.2 We believe that SMAC can provide a standard
benchmark environment for years to come.
Videos of our best agents for several SMAC scenarios are available at: https://youtu.be/VZ7zmQ_obZ0